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Abstract 

Pathwise predictability and predictors for discrete time processes are studied in deterministic 
setting. It is suggested to approximate convolution sums over future times by convolution sums over 
past time. It is shown that all band-hmited processes are predictable in this sense, as well as high- 
frequency processes with zero energy at low frequencies. In addition, a process of mixed type still 
can be predicted if an ideal low-pass filter exists for this process. 
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1 Introduction 

We study pathwise predictability of discrete time processes in deterministic setting. It is well known that 
certain restrictions on frequency distribution can ensure additional opportunities for prediction and in- 
terpolation of the processes. The classical result is Nyquist-Shannon-Kotelnikov interpolation theorem 
for the continuous time band-limited processes. It is also known that optimal prediction error for sta- 
tionary Gaussian processes is zero for the case of degenerate spectral density. The related results can be 
found in Wainstein and Zubakov (1962), Knab (1981), Papouhs (1985), Marvasti (1986), Vaidyanathan 
(1987), Lyman et al (2000, 2001), Dokuchaev (2008,2010). 

The present paper extends on discrete time setting the approach suggested for continuous time pro- 
cesses in Dokuchaev (2008). We study a special kind of predictors such that convolution sums over 
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future are approximated by convolution sums over past times representing historical observations. We 
found some cases when this approximation can be made uniformly over a wide class of input processes, 
including all band-Umited processes and high-frequency processes. For the processes of mixed type, 
we found that the similar predictability can be achieved when the model allows a low pass filter that 
acts as an ideal low-pass filter for this process. These results can be a useful addition to the existing 
theory of band-limited processes. The novelty is that we consider predictability of both high frequent 
and band-limited processes in a weak sense uniformly over classes of input processes. In addition, we 
suggest a new type of predictor. Its kernel is given explicitly in the frequency domain. 



2 Definitions 

Let D = {z € C : \z\ < 1}, D"" = C\D,T = {z e C : \z\ = 1}, 

We denote by £r the set of all sequences x = {x{t)}^_^ C C such that = 
(SS-oo k(*)r)^^'' < for r e [1, oo), |lx||^^ = sup^ \x{t)\ < +oo for r = +oo. 

Let ^+ be the set of all sequences x e £r such that x{t) = for t = —1, —2, —3, .... 

For complex valued sequences x E ii ot x E £2, v/e denote hy X = Zx the Z-transform 

00 

X{z) = x{t)z-\ zeC. 

t=—oo 

Respectively, the inverse x = Z~^X is defined as 

x{t) = —[ X{(^'^)e''^*duj, i = 0,±1,±2,.... 
If x £ £2, then X|t is defined as an element of L2{T). 

Let be the Hardy space of functions that are holomorphic on D'^ including the point at infinity 
(see, e.g., Duren (1970)). Note that Z-transform defines bijection between the sequences from £2 
the restrictions (i.e. traces) of the functions from on T. 

Definition 1 Let K, be the class of all functions k E £^ such that k(t) = Ofor t > and K = Zk is 

m = (2.1, 

where d(-) and 5{-) are polynomials such that degd < deg5, and ifd{z) = Ofor z G C then \z\ > 1. 

The class includes all kernels k representing the anti-causal linear constant-coefficient difference equa- 
tions. 

Definition 2 Let K. be the class of functions k : such that the function K{-) = Zk belongs to 
H°° n H'^. 
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It follows from the definitions that if k £ JC then k{t) = for t < 0. 

We are going to study linear predictors in the form y{t) = X]s=-oo ^(^ ~ s)x{s) for the processes 
y{t) = k{t — s)x{s), where k £ K, and k € K. The predictors use historical values of currently 

observable process x{-). 

Definition 3 Let X = {a;(-)} be a class of sequences from d.^, let r G [1, +oo], and let JC C IC be a 
class of sequences. 

(i) We say that the class X is dr^-predictable in the weak sense with respect to the class IC if for any 
k{-) € JC, there exists a sequence {^m(')}rn^i ~ {^m(") k)}^^^ C JC such that 

||y — ymll^^^O as m — >■ +CX) VxGX, 

where 

+ 00 t 

y{t) = ^^k{t - s)x{s), Vmit) = ^ kmit-s)x{s). 

s=t s=—oo 

(ii) Let the set Z{X) = {X(e*'^) = Zx\f^ x € X} be provided with a norm \\ ■ \\. We say that the 
class X is ir-predictable in the weak sense with respect to the class JC uniformly with respect to 
the norm \\ ■ \\, ifforanyk{-) G JCande > 0, there exists k{) = k{-,X,k, \\ ■ ||,e) € JC such that 

\\y -ylUr < i^\\x\\ VxgX, x = Zx. 

Here y{-) is the same as above, y{t) = X]s=-oo ^(^ ~ s)x{s). 
We call functions k{-) in Definition [3]predictors or predicting kernels. 

3 The main result 

Let Q G (0, tt) be given, and let 

Xl = {x{-) ei2: X (e*^) =0 if \uj\ > Q, X = Zx}, 
Xh = {x{-) ££2- X (e*"') =0 if |a;| < X = Zx}. 

In particular, Xl is a class of band-limited processes, and Xh is a class of high-frequency processes. 
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3.1 Predictability of band-limited and high-frequency processes from L2 

Let /Co be the class of all functions k G ioo such that k{t) = for t > and that K = Zk can be 
represented as 

Kiz) = ^, (3.1) 
for some real a G (—00, — 1) U (1, +00) and 6 € R. 

Theorem 1 ( i) The classes Xl and Xh are i2-predictable in the weak sense with respect to the class 
/Co- 

(ii) The classes Xi and Xh are ^ ^-predictable in the weak sense with respect to the class JC'Q uni- 
formly with respect to the norm ||X(e*'^) |lj;^2(-7r,7r)- 

(Hi) For any q > 2, the classes Xl and Xh are l2-predictable in the weak sense with respect to the 
class /Co uniformly with respect to the norm ||X(e*'^)||/,_^(_^ 

The question arises how to find the predicting kernels. In the proof of Theorem [H a possible choice 
of the kernels is given explicitly via Z-transforms. 

4 On a model with ideal low pass-pass filter 

Corollary 1 Assume a model with a process x{-) such that it is possible to decompose it as x{t) = 
xiit) + xnit), where xi{-) G Xl and xh{-) G Xh- Then this observer would be able to predict 
(approximately, in the sense of weak predictability with respect to the class JCq) the values of y[t) = 
Tlit^ A;(t — s)x{s)for k{-) ^ ICby predicting the processes yhif) = Ylt^t ~ s)xl{s) and ynit) = 
X^s^ A;(t — s)xHis) separately. More precisely, the process y{t) = yiit) + ynit) is the prediction of 
y{t), where yiit) = Yl-oo^L{t - s)xl{s) and ynit) = Yl-oo^H{t - s)xh{s), and where kii-) and 
kni') are predicting kernels which existence for the processes xl{-) and xh{-) is established above. 

Let XL (e*^) = l{\u\<n} and xh (e*^) = 1 - XL (e^'^) = l{\u\>n}, where G R; I denote the 
indicator function. 

The assumptions of Corollary [T] mean that there are a low -pass filter and a high-pass filter with the 
transfer functions xl and xh respectively, with x(-) as the input, i.e., that the values xl{s) and xh{s) 
for s < t are available at time t, where 

xl[-) = Z-^Xl, Xl (e*'^) = xl (e'") X {e'^) , 
xh{-) = Z-^Xh, Xh {e'^) = Xh (e'") X (e^-) , 
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and where X = Zx. It follows that the predictability in the weak sense with respect to the class ICq 
is possible for any process x{-) that can be decomposed without error on a band limited process and 
a high-frequency process, i.e., when there is a low-pass filters which behave as an ideal filter for this 
process. (Since xnit) = x{t) — XL{t), existence of the low pass filter implies existence of the high pass 
filter). On the other hand. Corollary [T] implies that the existence of ideal low -pass filters is impossible 
for general processes, since they cannot be predictable in the sense of Definition [3l 

Clearly, processes x(-) £ Xl Li Xh are automatically covered by Corollary [T] i.e., the existence 
of the filters is not required for this case. For instance, we have immediately that xi{-) = x{-) and 
xh{') = for band-limited processes. 



5 Proofs 

It suffices to present a set of predicting kernels k with the desired properties. We will use a version of 
the construction introduced in Dokuchaev (2008) for continuous time setting. This construction is very 
straightforward and does not use the advanced theory of i^^-spaces. 

Let /Ci be the class of all functions G /Co such that K = Zk can be represented as 

K{z) = (5.1) 

for some real a € (— oo, —1) U (1, +oo). 

If A; € /Co, then K = Zk can be represented as 

s z + h z + a + h - a c 

Kiz) = = = IH , 

z + a z + a z + a 

with a € (— oo) U (1, +oo), 6 S R, and c = b — a. It follows that the process y{t) for /c E /Co can 
be represented as y{t) = x{t) + c^^^ ki{t — s)x{s), where ki G /Ci. Therefore, it suffices to prove 
theorem for k £ ICi only. 

Let k{-) G /Ci and K (e*"^) = Zk be defined by (ISTTT ) for some for a £ (-oo, -1) U (1, +oo). 

Let G = and let 

l + acos{n) 

a = ■ TTTT-. (5.2) 

a + cos(i2j 

Let us show that a = f{a) G (—1,1). Clearly, the function 

. , , 1 + acos(il) 
/(«) = —— 77^ 

is such that /'(a) < for all a such that \a\ > 1, /(—I) = —1, /(I) = 1, and /(±oo) = cos(il). These 
properties imply that a = /(a) G (— 1, 1). 
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Further, we have that 1 + aa + {a + a) cos{Q,) = 0, and 

sign (a + a)(l + aa + (a + a) cos{uj)) > 0, u & G, 

sign (a + a) (1 + aa + (a + a) cos(tj)) < 0, uj e {-9.,9.)\G. (5.3) 

Set 

Viz) = 1 - exp ( 7sign (a + q)^^ | , = 7 G R. (5.4) 

Lemma 1 f/j ^(z) € and K{z) = K{z)V{z) e H'^ n H^. 

(ii) If-f < and u € [-fi, J^], then \V{e''^)\ < 2. If 'y > and uj £ [-vr, 7r]\(-il, il), then 
\V{e'^)\ < 2. 

(Hi) Ifuj G then ^(e*'^) ^ 1 a-s 7 ^ -00. //w G [-tt, 7r]\[-0, O], F(e^'^) ^ 1 as 

7 — > +00. 

(iv) For any e G (0, Q), y(e*^) — >■ 1 7 — )• —00 uniformly in lo G [— + e,Q — e] as 'j ^ —00, 
and y(e''^) — > 1 7 ^ +00 uniformly in oj G [— vr, 7r]\(— + e, — e). 

Proof of LemmaUl Clearly, F G and (z + a)" G n since the pole of (z + a)"^ 

is being compensated by multiplying with V. It follows that K(z)V{z) G //^ n H°°. Then statement 
(i) follows. 

Further, for a; G R, 

e^"^ + a _ [e''^ + a){e-''^ + a) _ I + aa + ae''^ + ae'^ 

Hence 

e*"^ + a 1 + oa + (a + a) cos(a;) 
Re ^ = . 

Then statements (ii)-(iv) follow from (15.31 ). This completes the proof of Lemma[T] □ 

Proof of Theorem^ For x(-) G £2, let X = Zx, A; = ^^^K, k = Z'^K, 

00 t 
y{t) = Y,Kt-s)x{s), y(t)= X] 

Let Y = Zy, let V and ii' be as defined above, and let Y = KX. 

Let us consider the cases of Xl and Xh simultaneously. For the case of the class Xl, consider 7 < 
and assume that 7 — —00. Set F = [—0., ^] for this case. For the case of the class Xh, consider 7 > 
and 7 — )• +00. Set F = [— vr, —Q] U [Q., +7r] for this case. 



6 



Let x{-) € Xl or x(-) G Xh- In both cases, Lemma [U gives that \V (e'"^) | < 2 for all uj £ F. If 
7 ^ — oo or 7 — > +0O respectively for Xl or Xh cases, then V (e**^) ^ 1 for a.e. a; € F, i.e., for a.e. 
CO such that X (e**^) ^ 0. 

Let us prove (i). Since K {e''^) G Loo(-7r, vr), K (e^^) G Loo(-vr, vr), and X (e*'^) G L2(-7r, vr), 
we have that Y (e*'^) = K (e^"^) X (e^^) G L2(-7r, vr) and Y (e*^) = K {e'^) X (e*^) G L2{-7r, vr). 
By Lemma[Tl it follows that 

Y (e^"^) ^ Y (e*"") for a.e. w G R, (5.5) 

as7— >— ooor7^+oo respectively for Xl or Xh cases. We have that 

\k {e''^) - K {e''^) \ < \V {e''^) - 1\\K {e''^) \ < 2\Km {e''^) \, w G T, (5.6) 
|y (e^"^) -y (e^"^) I < 2|y (e^"^) I = 2|ir(e^'^) ||X(e*"') I, a; G T. (5.7) 

By (I5.5I) . (I5.7I ). and by Lebesque Dominance Theorem, it follows that 

||y (e-) _y(e-) ||^^(_^^^) ^0, i.e., \\y - y\\^,^^_^^^^ ^ (5.8) 

as 7 — > — oo or 7 ^ +oo respectively for Xl or Xh cases, where y = Z^^Y. 

Let us prove (ii)-(iii). Take d = 1 for (ii) and take d = 2 for (iii). If X (e**^) G Li,(— 7r,7r) for 
V > d, then Holder inequality gives 

\\Y (e-) - Y (e-) ||^,(_,.) < \\k (e^) - K (e^) ||^^(r)||X (e^) ||^^(r), (5.9) 

where is such that 1/ ji + l/v = 1/d. By (15.61 ) and by Lebesque Dominance Theorem again, it follows 
that 

||^(e^'^) -ir(e^'^) lli^(r) ^0 V^g[1,+oo), (5.10) 

as 7 — > — oo or 7 — > +oo respectively for Xl or Xh cases. Then, by (|5.9I )- (I5.10I ). it follows that the 
predicting kernels k{-) = k{-,j) = Z^^K are such as required in statements (ii)-(iii). This completes 
the proof of Theorem [T] □ 

Corollary [Ufollows immediately from Theorem[T] 

6 On the prediction error generated by a high-frequency noise 

Let us estimate the prediction error for the case when predictor (15.41 ) designed for a band-hmited process 
is applied to a process with a small high-frequency noise. 

Let Q, G (0, vr) and u G [0, 1) be given. Let us consider a process x(-) G £oo such that |X(ia;)| < 1 
for (J G G and \X{iuj)\ < v for uj G [— tt, vr]\G , where X = Zx and G = {—0., Q). 
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Assume that predictor (15.41) is constructed under the hypothesis that v = (i.e, that x(-) is a band- 
limited processes from Xl), for some a € R\[— 1, 1]. For an arbitrarily small e > 0, we can find 
7 = 7(e) such that if the hypothesis that u = Ois correct, then 

\\y-y\Uo. < (6-1) 

where y(-) and y(-) are such as in Definition [3] 

Let us estimate the prediction error for the case when v > 0. We have that 

\\y-y\\i^<^\\Y{e'-)-Y{e^-) 

where Y = Zy and Y = Zy. Let J^i = — e/4 and Gi = (— ^i). By the assumptions on X, we 
have that 

\\Y (e*") - Y (e*-) <h + h + uh, 

where 

Ii = K [ e'^^^^Uco, I2 = K f e^^^'^Uu, h = K [ e^^(^)(ia;, 

JGi Jg\Gi J{-n,7T)\G 

and where k = max^ \ \K{e'^'^)\, 

ip{oj) = sign (a + a)Re ■ 



Note that ■(/'(f^) > Oforw G G. Let V'o = mina;eGi and let 7 = — log(2K/e)/V'o- Then /i < e/2. 

Further, h < Kmes (G\Gi) = e/2. Therefore, (lEB holds if = 0. 

The value Ii + /2 represents the forecast error when v = 0; this error can be done arbitrarily small 
with 7 selected as above when e ^ 0. 

Let us estimate I3. Clearly, < 1 + ^iF^ ^ where /x = 1 + |a — a|/(l — a). Hence 

r f log(2K/e) log(2K/e)^ 

I3 < K e"^duj = K I e 'I'o ^du = 2K(7r - 0)e '''o 

J{-tt,tt)\G J{-tt,tt)\G 

Hence 

— 

2k \ ^0 



ul^ < 2ki'{tt — 

The value 1^/3 represents the additional error caused by the presence of unexpected high-frequency noise 
(when u > 0). It can be seen that if e — )• than this error is increasing as a polynomial of with the 
rate depending on a (defined by Q and a). UQ^tt then \a\ — > 1 and /i — )■ +00, and, for a given e, the 
error is increasing exponentially in fi. 
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7 Concluding remarks 

• By (15.21 ). Q ^ ±1 as $7 — > TT, and the predictor suggested above loses its feasibility as ^7 — )• vr. 
(In particular, \\k\\£^ — > +oo). 

• If /:;(•) is a real valued function, then k is also real valued. It follows from the fact that K (z) = 
K{z), and, therefore, K (e"*'^) = lO^. 

• A similar approach can be applied to the case when X{z) vanishes on some connected set / C T. 
In this case, the classes Kq and Ki have to be replaced by similar classes with complex a G D^. 
For real valued kernels, it could be meaningful to include the functions K represented by the sums 
of two simple fractions, to ensure that the process Z^^k is real (i.e, that K (e*"^) = K (e"*"^)). 

• The predictors obtained above require the past values of x{s) for all s G (— oo,t]. In practice, 
Yll=-oo ~ s)x{s) can be approximated by Y11=~~m ~ s)x{s) for large enough M > 
0. In addition, the corresponding transfer functions can be approximated by rational fraction 
polynomials. 

• The system for the suggested predictors is stable, since the corresponding transfer functions have 
poles in the domain {\z\ < 1} only. However, the suggested predictors are not robust. For 
instance, if the predictor is designed for the class and it is applied for a process x{-) ^ Xl 
with small non-zero energy at the frequencies outside [— then the error generated by the 
presence of this energy is increasing if 7 ^ oo. 

• The results of this paper can be applied to discrete time stationary random Gaussian processes. 
In particular, assume that the spectral density of the underlying process x{t) vanishes outside the 
interval [— il, 0] C (— vr, vr). It is known that the minimal (optimal) predicting error is zero in 
this case. The sequence of the predictors constructed above represents a sequence of suboptimal 
predictors leading to vanishing prediction error. 
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